Skip to content

feat: diff-scoped change-pack (impacted subgraph + affected tests + cost estimate) with CLI/MCP parity#234

Merged
theagenticguy merged 7 commits into
mainfrom
feat/change-pack
Jun 14, 2026
Merged

feat: diff-scoped change-pack (impacted subgraph + affected tests + cost estimate) with CLI/MCP parity#234
theagenticguy merged 7 commits into
mainfrom
feat/change-pack

Conversation

@theagenticguy

@theagenticguy theagenticguy commented Jun 14, 2026

Copy link
Copy Markdown
Owner

What

A new deterministic, diff-scoped change-pack capability — exposed at full parity as the change_pack MCP tool and the codehub change-pack CLI command. Given a git range, it returns one object with four sections:

  1. Impacted subgraph — the union of per-changed-symbol upstream fan-outs, retained (not collapsed to a scalar the way verdict does internally), deduped, deterministically capped.
  2. Verdict — delegates to computeVerdict verbatim (5-tier, exit code 0/1/2).
  3. Affected tests — the tests that transitively exercise the changed symbols. These already existed in the graph but impact.ts classified-then-dropped them; this surfaces them (reachedFromSymbol + depth, sorted).
  4. Cost attribution — real o200k_base BPE token counts (via gpt-tokenizer) for the scoped pack vs. a blind "read every impacted file" baseline, plus ciTestsSkipped. estimate: false / tokenizer-model: openai/o200k_base.

Why

OpenCodeHub's bet against the agentic-grep-first consensus is sharpest in CI: the diff is known, the run is unattended, and re-exploration is pure token cost. change-pack answers "what does this change touch, how risky is it, which tests must run, and what did scoping it save" in one deterministic call — the CI-oriented complement to the interactive tools.

CLI ↔ MCP parity (hard requirement)

Both surfaces delegate to the single @opencodehub/analysis.runChangePack core, so they cannot disagree on values. The first cross-surface parity test in the repo proves it, split to respect package rootDir:

  • CLI test: --json deep-equals the raw ChangePack (pure passthrough).
  • MCP test: toStructured recases losslessly (recase-back == raw pack).

Together: identical values across both surfaces. Exit code = verdict.exitCode, so CI gates match codehub verdict.

Hard rails honored

Self-hosted OSS · stdio MCP only · no LLM in the query path · no IDE/web-UI · byte-deterministic (canonical-JSON, pre-sorted collections, content hash). Reads the graph only — no re-derived edges (inherits SCIP-precise resolution), no graph mutation, no new dependencies.

Validation

mise run check exits 0 — lint (685 files), typecheck, full test suite (all 17 packages, 0 failures), banned-strings. New coverage: 11 analysis core + determinism tests, MCP envelope + parity tests, CLI surface tests. Server tool-roster contract bumped 28 → 29.

Notes / follow-ups

  • Real o200k_base token counts — uses gpt-tokenizer (pure-JS, MIT, zero-dep), chosen over native/WASM tiktoken to honor the no-native-binding rail (ADR 0015). The …/encoding/o200k_base subpath bundles its BPE ranks inline, so encode is synchronous and deterministic (the content hash + byte-identity hold). A len/4 heuristic remains only as a throw-fallback. (The repo's @opencodehub/pack tokenizerId pin is still provenance-only — change-pack now does its own real counting.)
  • affectedTests uses the narrow isTestPath predicate; the multi-language isTestFile/pairedTestCandidates in ingestion is a candidate to consolidate later.
  • The repo-root CLAUDE.md "28 tools" prose line is now stale (29) — left for a docs sweep.

Built via ERPAVal (spec .erpaval/specs/007-change-pack/).

Generalizes computeVerdict's internal diff->upstream fan-out but RETAINS the
impacted subgraph instead of collapsing it to a scalar blastRadius, and surfaces
the affected tests that runImpact classifies-then-drops today.

runChangePack(store, query) composes four sections:
- impacted subgraph: union of per-changed-symbol runImpact(upstream) fan-outs,
  deduped (nodes by id keeping min depth, edges by from/type/to), capped at 5000
  nodes deterministically; production-only by default, includeTestsInSubgraph
  retains tests.
- verdict: delegates to computeVerdict verbatim.
- affected tests: upstream reachability filtered by isTestPath, each carrying
  reachedFromSymbol + depth, sorted (filePath, id).
- cost attribution: char-heuristic token estimate (len/4; the repo ships no real
  tokenizer, tokenizerId is provenance-only) for the pack body vs a blind
  baseline summing every impacted file; self-labeled estimate:true with
  tokenizer-model char-heuristic-v1. Never claims model tokens.

Deterministic: canonical-JSON, all collections pre-sorted, content hash over the
placeholder-blanked envelope. Reads the graph only (no re-derived edges, no
mutation); no LLM. 11 unit + determinism tests.
Both surfaces delegate to the single @opencodehub/analysis.runChangePack core,
so they cannot disagree on values.

- MCP change_pack tool: withStore + withNextSteps + staleness + error envelope,
  registered in server.ts; structuredContent recases top-level keys plus the
  interiors of impacted_subgraph and cost_attribution to snake_case.
- CLI codehub change-pack: commander registration, --base/--head/--depth/
  --min-confidence/--budget/--include-tests-in-subgraph/--json; --json is a pure
  passthrough of the raw ChangePack; exit code = verdict.exitCode so CI gate
  semantics match codehub verdict.
- First CLI<->MCP parity test in the repo, split to respect package rootDir:
  the CLI test asserts --json deep-equals the raw pack (passthrough); the MCP
  test asserts toStructured recases losslessly (recase-back == raw pack), so
  together they prove both surfaces serialize identical values.
- server roster contract bumped 28 -> 29 (change_pack added).
The ERPAVal spec + derived task list driving the change_pack feature: the four
output sections, CLI/MCP parity AC, determinism/byte-identity, affected-test
selection semantics, and the char-heuristic cost-attribution decision (tokenizer
is provenance-only in the repo, so cost figures are explicitly estimates).
Two durable ERPAVal lessons from the change-pack session + INDEX pointers:
- tokenizerId is provenance, not an encoder (cost features need a labeled
  heuristic or a real tokenizer dep; the repo ships neither today).
- CLI<->MCP parity via one shared analysis core + a split parity test that
  respects the mcp package rootDir; the server roster contract bumps by design.
…ibution

Replaces the char-heuristic estimate with real BPE token counts via
`gpt-tokenizer` (pure-JS, MIT, zero-dep). Chose it over the native/WASM
`tiktoken` package to honor the no-native-binding rail (ADR 0015); the
`gpt-tokenizer/encoding/o200k_base` subpath bundles its BPE ranks inline, so
`encode` is synchronous and deterministic — byte-identity and the content hash
hold.

- `countTokens` encodes via o200k_base, falling back to `max(1, ceil(len/4))`
  only on input that throws (rare, still deterministic) so cost never crashes
  the pack.
- `CostAttribution` now reports `estimate: false` /
  `tokenizerModel: "openai/o200k_base"`; the type widens `estimate` to boolean
  and `tokenizerModel` to string. The field is folded into the hash, so the
  basis change is visible in `changePackHash`.
- CLI + MCP summaries drop the "(est.)" hedge and name the tokenizer.
- Tests, fixtures, EARS spec U6, and the tokenizer lesson updated to the real
  encoder; license allowlist + OSV clean (gpt-tokenizer is MIT, no advisories).
The CLI bundles `@opencodehub/*` source (noExternal) but keeps third-party
deps external, resolving them from the CLI's own node_modules at runtime. The
change-pack o200k counter added `gpt-tokenizer` to @opencodehub/analysis but
not to the CLI, so `node dist/index.js analyze` threw "Cannot find package
'gpt-tokenizer'" in CI (the self-scan + every fresh-install matrix leg) while
passing locally via the pnpm workspace hoist. Re-declare it on the CLI like the
other analysis third-party deps (@iarna/toml etc.). Per the
tsup-collapse-monorepo lesson: verify from the built dist, not the hot
workspace.
…stic

The cost-attribution tests key an in-memory file map with POSIX paths, but
runChangePack joins repoPath + relPath with the OS separator — so on Windows
the lookup arrived backslash-separated, missed the fixture, and the blind
baseline computed as 0 (two failing assertions on windows-latest). Normalize
the seam's lookup key to forward slashes so the fake reader matches on every
platform. Production fs reads are unaffected (real paths use the right
separator); this is a test-fixture fix only.
@theagenticguy theagenticguy merged commit 4e5e705 into main Jun 14, 2026
38 checks passed
@theagenticguy theagenticguy deleted the feat/change-pack branch June 14, 2026 18:20
@github-actions github-actions Bot mentioned this pull request Jun 14, 2026
theagenticguy pushed a commit that referenced this pull request Jun 14, 2026
🤖 Automated release via release-please
---


<details><summary>root: 0.9.1</summary>

##
[0.9.1](root-v0.9.0...root-v0.9.1)
(2026-06-14)


### Features

* diff-scoped change-pack (impacted subgraph + affected tests + cost
estimate) with CLI/MCP parity
([#234](#234))
([4e5e705](4e5e705))


### Bug Fixes

* **ingestion,cli:** make a broken parser fail loud, not silently
produce a symbol-free graph
([#204](#204))
([94b9165](94b9165))
</details>

<details><summary>cli: 0.9.1</summary>

##
[0.9.1](cli-v0.9.0...cli-v0.9.1)
(2026-06-14)


### Features

* diff-scoped change-pack (impacted subgraph + affected tests + cost
estimate) with CLI/MCP parity
([#234](#234))
([4e5e705](4e5e705))


### Bug Fixes

* **ingestion,cli:** make a broken parser fail loud, not silently
produce a symbol-free graph
([#204](#204))
([94b9165](94b9165))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant